MTA Mas FSRix) VREESEIDO -- Support Vector 


Regression 


VES: BG ARE: AË (id: redstonewill) 


HEITE Sri TS Kernel Logistic Regression, WiskIHESVMASHIS KP 
soft-binary classification E, FAyAE(EAA2-level learning, SCHEI GV MISS Sut30 
w, SE FRB Fili logistic regression kA, WAC, XIEXDRUWEECG OR 
i, (SRE. Wea, EIMA TLAM Representer Theorem, 4#z/4)4, 5| A 
SVMBSkernelizI5, Eifzexjlogistic regression ýT. ATE ERA 
SS, WiewagsvuaykernelfxI5hvFA2zIlregression[ajai_F. 


Kernel Ridge Regression 


Szen —r i748 Representer Theorem, XSFHIASIEMWIMAYL2- 
regularized linear model, EAYEtE wah US REZAR, Alt, tb 
WLBES /NkerneltzI5, fef kernelized(4. 


for any L2-regularized linear model 


N 
: À 1 
min NW w + N Um W Zo) 


optimal w, = Y" , 3,z,. 
— any L2-regularized linear model can be kernelized! 


Spar regressiontis& 25 pkkernel hE? BAI BU frzBRSlinear/ridge 
regression& hR ri squared error, Blerr(y, al z) = (y — w? z)2, X 
PPRA AÝ æÆanalytic solution, Boch Sher, Tele, Ed 
FESS. BAR BREATH A f kernels LA&llridge regression 
=, (84) 5zxIM7Wanalytic solution, 


$kfiJZciBKernel Ridge Regression[B]ERs BSE: 


N 
í " ë 3 vn 1 
solving ridge regression min —wiw+ N Y ` (ya - wiz.) 


N 
n=1 
N 
yields optimal solution w, = Y 352; 
n=1 


Eich, Ew, WREZEMAS, SR SEI, = 7, 4 Bn Sol Slide 
regression, 4§ziN At kemek, Rüskao, Naas, Ne, 452: 


with out loss of generality, can solve for optimal 3 instead of w 


N N N N 
Š À Y 3 1 3 Y 
€ N BnBmK(X», Xm) + N (» — Bm K (Xa, il 2 
n=1 m=1 


n=1 m=1 


regularization of 3 on K-based regularizer linear regression of 3 on K-based features 


— eT 1 rett Ter T 
= vil ki X (8'KTKa — 28  K'y +y’y) 


ridge regressionm] LAS RER DOS. RASMUSSEN, MAM 
HIE Bk B, error function, XH, FON BAR Re SE Wl, (E, 
SAAE T kernel ridge regression[BJER, 


RERA, BST ELS RATER: 
Eag(3) = X" K8-- V (8TK'Kà — 28" KTy +y'y) 

e N N 
2 


VEag(9) = A (AK71 +K'KB— K'y) = S KT (Qu 1 KI - y) 


Eug (B) XT DH Erd, SX E(8) E MARE, MOREA 
a, LISER EI, BOBEASHMA, V Eg (O Eder, 
SHSTS, BUA SEPA 8685 AIRETA : 


B= (MI + K) ty 
KERELE + bh ëtt eet SRESEN. AARNA 
MAT, RAK EMercer's condition, CH XIEEHN, MBA > 0, ARAL 
(AI + K) Sans. MAWRA ER bakin, (AT + K)ENxN/Jv89, 
PPUNIISRESO(N?), ZEA, VE, (PD): ERPSIRGRERPABRRU, Be 


K, BRASWUK=0N le? BSL, (TEEN Set DUTOT, Ae 
SS, RIANZ, DOIT, Al, En KEFE., XX ERA 
thik Y (AI + K)ËEdense matrix, BORRADE ESEA. ik MEER, BAZ 
Ip BB. 


PR, SetJmILBkernels&fiXnon-linear regression., "ët Flinear 


ridge regression#[]kernel ridge regressionB X A, 


"x e 


tI ERA, ZEXJuElinear ridge regression, Bei Gi0Hkernel ridge 
regression, Pn ARR F, Aaah Ree. ae 
regression (tA IRAR? XtFlinear ridge regressionRin, CEIR 
H, REA: EDA, CAMARO + d2 N), Tli sO(d) 
, MERNELAAARZEB], XARA AA, mUXjJ kernel ridge regressionz&ijé, © 
FERS Z25/6), fsFHkerneliXI5, (Glo). ASI: AK, ANII 
A EAT EEERO(N?), WWNSREZO(N), HRSENSX. NGABSSHE, "IS 
EMRA, ATLA, kernel ridge regressioni& d NT RARO BEA. CORR, Tin 
linear#kernelSin EE (efficiency) FIRE (flexibility) IAI. 


linear ridge regression 


kernel ridge regression 
B = (AI+K)'y 
e more flexible with K 
e O(N?) training; 
O(N) prediction 
—hard for big data 


w = (AI+X'x)'x’y 


e more restricted 

e O(d? + d?N) training; 
O(d) prediction 
—efficient when N > d 

















linear versus kernel: 
trade-off between efficiency and flexibility 


Support Vector Regression Primal 


Sii te SSS DEGREE Hh Zidlinear regressions LAFAsKtitclassification, BBZ_E 
— hao MAKI kernel ridge regression En] AR classification, #e{i}#Ekernel ridge 
regression FHfEclassification ERX FH, HUWleast-squares 

SVM (LSSVM) , 


wk RIFE NA, soft-margin Gaussian SVMf[IGaussian LSSVM RAW 
WEED E, 














soft-margin Gaussian SVM Gaussian LSSVM 


tO FBP, (UEL en, soft-margin Gaussian SVMf[Gaussian 
LSSVMZSIRAIRA, BDBSAUBSAY2SZERSJVJ-tRISIBJ. (BUS Support Vectorfi 
i& (AFHIR) , Aisoft-margin Gaussian SVMÉSSVARZ, oA 
Gaussian LSSVMHEEA Lia. XK Wsoft-margin Gaussian SVMARY 
Oe AE ESTES, a, > OW RR ADE, ATLASV, MUFLSSVM, $i] E— 
OMAT PARDEE ESE, APL Ska, SVAZ 
KNOR, MEWE) — Dr bn K (an, x), WRG ARS, BB 
OD SD rk. tte, Str), soft-margin Gaussian SVME 
SOS. 


e LSSVM: similar boundary, many more SVs 
=> slower prediction, dense 8 (BIG g) 


e dense 3: LSSVM, kernel LogReg; 
sparse a: standard SVM 


BBA, #TXJLSSVMrHRdense OD usa, Fell THERE AIL AAS lsparse B, 
(SESVEREAZ, Wimtsslliftisofttmargin SVME ARE? REET Sit 
RX Nae. 


AAE Mu fitTube RegressionAymz, BITE E RBI 
(HX) , WRAGERAHHXTKAA, WADI, Lët et 


Z2 83157521 ferror, 





[BixEFRSZEXB928/27392e, e  0,8BA error measure ER: 
err(y, s) = maz(0,|s — y| — e), Xii EEdFPZTESENGEBSEBES, 


error measure: 


em(y,S) = max(0,|s — y| — e) 
e |s— y| < e: 0 
ë y> as y € 
— usually called «-insensitive error with « > 0 ] 


jS Hix errorll{ite-insensitive error, Xipmaxk Oise EH f HaBfghinge 
error measure ESCEESIMDABS. ATLA, Belle PRES Swe SEL2- 
regularized tube regression{it2s(.Fsoft-margin SVMBSifEr, MrmfSsllsparse 6. 


Bs, FeiEtube regression FYerrorSsquared error N EH : 





squared: err(y, s) = (s — y? 


ZE, Herr(y,s) SshKAHA aR: 


—squared 
—tube 


err tube ~ squared when |s — y| small 
& less affected by outliers | 


BI, ZT6R89ZE3Zz<squared error, HE AZe%eztube error, FRA,  :[s-y|EE 
Sache BeUTYAYEN , squared errorStube errorzeze^ Ze A/JNiS.. more vis 
AAKI, squared errorftlëëkIëtegëltiute error 482, errorfl EK E EEERAC, K 
ZEE SEnoiseBys Ug, ARI aCe ANA. BLA, BOX eu Bls, tube 
regressionfJixierror function=By—, 


INE, FRiHEL2-Regularized Tube Regression 5E F3: 


N 
min Aww I x max (o. ini Mw-— e) 


n=1 


XTRA, AFRPeSmaxlh, HAEATA, PANES 
GDISGDSKkKR mA, SA Erepresenter theorem, nJBEIBIX 5 | AkernelsEsk 
f, (BIAS RULE Esparsity 8. MBAS, ell JF] DS R68 7J 
Tas (FAVQP (Alea, (';R&dual SVM SDA, SlAkernel, EIKKTAA, MTR 
fit BEsparsefiy, 


Regularized Tube Regr. 


standard SVM 


min Aw’ w-+ Z Ð tube violation | min 3w^w + C Ð margin vio. 


e not differentiable, 
but QP 


e dual to kernelize, 
KKT conditions > sparsity 


e unconstrained, 
but max not differentiable 
e 'representer to kernelize, 
but no obvious sparsity 





PALA, #etJeteyLAFEL2-Regularized Tube Regression 5 BER SV M2MUIBSTEESS : 


will mimic standard SVM derivation: 


N 
D 1 a F 
min zW w+ Ge (0, |w" Zn + b — ynl] _ e) 
(BSN, KAAMERA, Ask AAA Cik jv, ARIST RCRA. DU 
BG Diane BG EES F disk, DERE IZ BUTESSVMBSRSEBJ7S Ae ELA 


EA 


MERIJEZET Standard Support Vector Regression hE, VAENE 
FNVEAYQP (AR. FERINA RA N Ee READS : 


mimicking standard SVM making constraints linear 





N N 
: 1 T, 1 n V ^ 
min 2ww-- C) ^c, 5€ W-- C» (En + n) 
n=1 n=1 
En > 0 AB CH 


HI Ri Ob, BARNERAPENA, chev FE; 13 8l Rz xupper tube violations#l 
lower tube violations, j<#hJZ=CHU/IWSupport Vector Regression (SVR) primal, 


N 
ie 
=w'w+C A LE 
ver 9 302 (G8) 


St. —e— EX a Wiz; - bec & 
620620 


SVO EL TL SERRA: Cle, CH7 Æregularizationtitube 
violation ZAI. large Cime tube violation, small CUE F regularization, 
CT Tube) Idee, PRANAB. cthA, Wl Des CES E 
X. ERARA Sven, SVMrBigSBixTeXu Ab, SVRAY 
QPĚERHBA --1--2N^-839, 2N+2N fE, 


* parameter C: trade-off of regularization & lo 
tube violation 

e parameter e: vertical tube width 
—one more parameter to choose! 


e QP of d+ 1 + 2N variables, 2N + 2N 
constraints 





Support Vector Regression Dual 


WEB TE 8I T SVRAprimalzst, EZ T3EEHESSVRBDualtest, ESE, 5 
VM, Jr. ef DIS iA, 23 ri sne Rest, 
Eed Ee EE EICH 


N 
objective function ew +C Ire 
n=1 
Lagrange multiplier a^ for yn—w'z,—-b<e«+é 
Lagrange multiplier a; for —e—£; < yn—W'Zn-— b 


Ja, SSVM— RES Ciel, BURSBELESÉIDSEOR SE, 132) 
TRINBSKKTZRTAE: 


Some of the KKT Conditions 





ac PE" ac = 
° Ge = 0: w = >; (on — On) 20 : Sp = 0: 2 (od — a8) —0 
Bn 
a^(e-- £^ — yy --w'z,-b) = 0 
* complementary slackness: A N P F S E g : 
amate n n— ae = 


#2 BK, IJEGSDULEXSVM primal SVM dualfiS 2508 NEZ, BREMASVR primalifE=tH 
SVR dualÉzV, (BRATS, UCAS! ) 


i 1 z 
min zw Ww * CY én 
n=1 
s.t. HSC WER + b)>1-£n 


En > 0 









N WN 
WENS 
min 2 b P AanamYnyYmK (Xn. Xm) 


n=1 m=1 
N 

Y 1 o 
n=1 


N 
s.t. E Yn@n = 0 
n=1 


O<an<C 


N 
min ¿w'w + C) (6h + €) 
n=1 
st. 1(yo - w!z, — b) < e £^ 
1(w!z, + b — yn) < e + £y 
m 20,65 > 0 





N N 
min 5 > (a4 — a¥ (ats — aX). 


n=1 m=1 


N 
+ ((e— yn): o8 + (e + yn) o5) 


n=1 
N 
st 5° 1-(af —ay) 20 
n=1 


O<ah < Gü < aE < G 





Be, PME kitic— RSVR EB)EsparseRy. HACAS f SVR dual 
Fr MESA Bu 


N 
w= Y “(ah = Daf 
n=1 


#8nzHJcomplementary slackness73 : 


aA (e+ £5 — ya + W'z, + b) 0 
av (epey W'z;- b) = 0 


WFO ApEtue OKA, Elw zn +b ynl < e, ERARE, rad #0 
eA spacer a Wiljcomplementary slacknessPA SrtA BISA AE, "EAS 
a^ = OftlaY = 0, BIB, = ol -aY — 0. 

BEA, Foot etubeAWNA, 4821896880, = 0, Ssparsef). mm ptttubez ^H 
m, B. Z 0, Bilt, RIENT SVRIlSsparsefft. 


Summary of Kernel Models 


EBM EI MAAS kernek li EFI zs, RAHENA ER 
RAY, 435l|E&PLA/pocket, regularized logistic regression#[linear ridge 
regression, jx Hp RHET LP Ee ve ph, len Lin Hr Sp linear 


FERRER. 


Ab, RIMET linear soft-margin SVM, HHAYerror function Ze? rey, DUB 
fa EAYOP [ASKS linear soft-margin SVM#IPLA/pocket—#¥ah eh AEA 

E SA, VE ZEE linear SVR, E4Slinear ridge regression —fEEBze fik [s] RE 

Aviat, JASVMÉBSfSIE, A errivne, *HÉRJJQPIBAOEE IKE, (ID 
BEZAS. 


linear SVR 


minimize regularized 
errrus= by QP 








PLA/pocket 
minimize 
errg;, Specially 





















linear soft-margin 
SVM 

minimize regularized 
EITsvM by QP 


linear ridge regularized logistic 
regression regression 


minimize regularized | minimize regularized 
errson analytically errce by GD/SGD 






second row: popular in LIBLINEAR J 


TAPAR tip] Let AdualfZzt, SlAkernel, SPIERT 


PLA/pocket linear SVR 


linear soft-margin 
SVM 













linear ridge 
regression 


regularized logistic 
regression 
















kernel ridge 
regression 


kernelized linear ridge 
regression 


kernel logistic 
regression 


kernelized regularized 
logistic regression 

















probabilistic SVM 


run SVM-transformed 
logistic regression 






minimize SVM dual by 
QP 


minimize SVR dual by 
QP 





fourth row: popular in LIBSVM | 


HrRSVM, SVRfBIprobabilistic SVM&Bn] LIFRE &S X 3EBSJChih-Jen LinketAR 
BSLLibsvmfEBSZIGEREAR. IRS, jxueSRUrRSVRfprobabilistic SVME7J* FB. 


det 


Zap 


Ai RS rr SVR, Feil fciitrepresenter theoremEBi@, ridge regressions 
‘AkemelfyAzZzt, Blkernel ridge regression, FH#ES T SVRBSf##, (D2152 
denseB9, ASB ASEBIB. PALA, Sfi XE WB tube regression, f&FBSVMBSIES 
Aik, ?Kk&]M^4regularized tube errors, INHER, (2 Ssparselyst. Es 
E, RIMAS kernek MA RM, BARRA TS RAR. TESCERQMLFH 
rB, Feet HEAR FHT Ai aH, 


EA: 
NS DREI ETH AEA RE 


